lsh, Nearest neighbor search in high dimensions

نویسنده

  • Edwin de Jonge
چکیده

Calculating distance pairs is O(n2) in memory and time and finding the nearest neighbor is O(n) in time. Tree indexing techniques like kd-tree [2] were developed to cope with large n, however their performance quickly breaks down for p > 3 [3]. Locality sensitive hashing (LSH) [3] is a technique for generating hash numbers from high dimensional data, such that nearby points have identical hashes. This enables efficient nearest neighbor search for (very) high dimensional data sets. It has been successfully applied to several problems including text similarity search [5].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LSH At Large - Distributed KNN Search in High Dimensions

We consider K-Nearest Neighbor search for high dimensional data in large-scale structured Peer-to-Peer networks. We present an efficient mapping scheme based on p-stable Locality Sensitive Hashing to assign hash buckets to peers in a Chord-style overlay network. To minimize network traffic, we process queries in an incremental top-K fashion leveraging on a locality preserving mapping to the pee...

متن کامل

SIMP: Accurate and Efficient Near Neighbor Search in Very High Dimensional Spaces

Near neighbor search in very high dimensional spaces is useful in many applications. Existing techniques solve this problem efficiently only for the approximate case. These solutions are designed to solve r-near neighbor queries only for a fixed query range or a set of query ranges with probabilistic guarantees and then, extended for nearest neighbor queries. Solutions supporting a set of query...

متن کامل

MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over...

متن کامل

An Investigation of Practical Approximate Nearest Neighbor Algorithms

This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: c...

متن کامل

Reverse Nearest Neighbors Search in High Dimensions using Locality-Sensitive Hashing

We investigate the problem of finding reverse nearest neighbors efficiently. Although provably good solutions exist for this problem in low or fixed dimensions, to this date the methods proposed in high dimensions are mostly heuristic. We introduce a method that is both provably correct and efficient in all dimensions, based on a reduction of the problem to one instance of εnearest neighbor sea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014